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Final Report 


ITERATIVE METHODS FOR LARGE SCALE STATIC ANALYSIS OF 
STRUCTURES ON A SCALABLE MULTIPROCESSOR SUPERCOMPUTER 

NAHIL ATEF SOBH 1 

A parallel Preconditioned Conjugate Gradient (PCG) iterative solver has been developed 
and implemented on the iPSC-860 scalable hypercube . This new implementation makes 
use of the PARTI (Parallel Automated Runtime Toolkit at ICASE) primitives to efficiently 
program irregular communications patterns that exist in general sparse matrices and in 
particular in the finite element sparse stiffness matrices. 

The iterative PCG has been used to solve the finite element equations that result 
from discretizing large scale aerospace structures. In particular, the static response of 
the High Speed Civil Transport (HSCT) finite element model cited in reference [3] is 
solved on the iPSC-860. 

The preconditioned conjugate gradient algorithm implemented on the iPSC is 
outline below. 

We have parallelizied the following three crucial steps: 

1. The matrix vector multiplication ( Kp ). 

2. The inner products ( z T r and p T {Kp) ). 

3. The solution of the preconditioning step ( Mz = r). 


Principal Investigator, Assistant Professor Deportment of Mechanical Engineering and Mechanics 


1. 

UQ 

(Initialize displacements) 

2. 

r 0 = f - Ku 

(Initialize residual ‘out of balance forces’) 

3. 

for k = 1,2, .. do 

(Iteration loop on k) 

4. 

if |[r j fc_ 1 || < tolerance, exit 

(If equilibrium is satisfied ‘exit’) 

5. 

Solve Mztf-i = r*_i 

(Solve for the iteration vector) 

6. 

0.-0) 

(Evaluate the scaling parameter) 

7. 

Pk = z k—l + PkPk-1 (Pi = Zq) 

(Update search direction) 

8. 

n . t _ z l- i r *-» 

ak ~~FU^r 

(Evaluate new step length) 

9. 

u k = u*_i + a k p k 

(Update displacements) 

10. 

r k = r*-i - ot k Kp k 

(Update residuals ‘out of balance forces’) 

11. 

end of for loop 

(go to 3) 


Good preconditioners are those which approximate or mimic the operational role 
of the original problem. Thus the preconditioning step given in the PCG algorithm 
involves solving a system, Mz = r, which is very close to the original system, Ku = /. 
The communications overhead constraint imposed by MIMD machines limit the use of 
highly coupled direct solvers. Therefore we have used an approximation to K that has 
a loosely coupled structure. In particular, we have used a diagonal, i.e., point Jacobi 
preconditioner. Jacobi type preconditioners parallelize very well on MIMD and SIMD 
supercomputer architectures. Although they may look simple it has been shown that their 
overall performance can be very competitive when compared to other preconditioners. We 
have also implemented a block SSOR family of preconditioners. The SSOR family of 
preconditioners accelerate the convergence rate but at the expense of extra computations and 
less communications. The current work is focusing on the trade off between communications 
and computations when using the SSOR preconditioners. 


Computer listing of the C program is given in the appendix. 
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This research has accomplished the following tasks : 

1. Implemented a conjugate gradient iterative solver for symmetric and sparse matrices 
on the Intel iPSC-860 hypercube at ICASE. 

2. Implemented SSOR and point -Jacobi preconditioners to accelerate convergence of the 
conjugate gradient iterative solver and reduce communications. 

3. Generated large-scale finite element models of structures using the COMET software 
at NASA Langley Research Center. Performed linear static analysis on the generated 
structural models on the Intel iPSC-860 at ICASE/NASA-Langley. 

4. A general purpose software has been implemented to generate full stiffness matrices 
given the symmetric part from COMET. 


No. of 
Processors 

CPU time (seconds) 
diagonal preconditioning 

CPU time (seconds) 
SSOR 

High Speed Ci\ 

nl Transport (HSCT) (16,146 equations) 

8 

1342 


16 

786 

764 

32 

565 

438 

Blade Siffened 

Panel (BSP) (1,824 equations, ) 

1 

2.5 

3.0 

2 

2.1 

2.1 

4 

1.8 

1.4 

8 

1.9 

1.3 

16 

1.7 

1.2 
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The table shown above summarizes the performance of the preconditioned conjugate 
gradient iterative solver. The first column is the number of processors used to solve the 
above mentioned problem, the second column corresponds to the amount of cpu time in 
seconds required to solve a given problem using the diagonal of the stiffness matrix as 
a preconditioner. The third column is the cpu time in seconds required to solve a given 
problem using a symmetric successive over relaxation preconditioner. 

The speedup performance on the large problem (HSCT) is modest. The speedup 
performance is expected to be better as the number of equations is increased. Although one 
may think that a direct solver will be much more efficient on these problems, one should 
not ignore the fact that direct solvers require extensive amount of storage which limit the 
amount of equations one can solve. Thus iterative solvers will find most of their efficient 
use on super large problems which exceed in their size the core memory available on a 
given supercomputer. This limitation should drive one to use out-of-core solvers which 
are slow compared to iterative solvers and are not easy to parallelize. Therefore efforts 
should concentrate on developing efficient parallel out-of-core solvers and iterative solvers 
with emphasis on parallelizable preconditioners to efficiently use the future technology 
of massively parallel supercomputers. 
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